-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[query] Truncate search results to prevent OOM on jaeger-query. #1202
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
@@ -237,7 +237,10 @@ func (s *SpanReader) FindTraces(ctx context.Context, traceQuery *spanstore.Trace | |||
} | |||
if traceQuery.NumTraces == 0 { | |||
traceQuery.NumTraces = defaultNumTraces | |||
} else if traceQuery.NumTraces > 10000 { | |||
traceQuery.NumTraces = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this hardcoded upper limit? It should be user-configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Also -
Going through the current traceQueryParameters()
https://github.com/jaegertracing/jaeger/blob/master/cmd/query/app/query_parser.go#L139 - NumTraces
seems to be defined as defaultQueryLimit
but in its implementation it's used when NumTraces
is not passed explicitly - https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/cassandra/spanstore/reader.go#L238
Should this be renamed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^ This is fine. My bad, confused the two.
So I'm a little confused. Each query can already be parameterised with a limit on number of traces - https://github.com/jaegertracing/jaeger/blob/master/cmd/query/app/query_parser.go#L139 (if not specified then a default is used) - doesn't this resolve #1051? If no, then why not? |
The original ticket was about individual traces with large number of spans (we've seen traces with tens of millions of spans). |
@yurishkuro OK. In that case, I will drop the following changes to
But will retain the following
Should |
29a998d
to
f67231c
Compare
Codecov Report
@@ Coverage Diff @@
## master #1202 +/- ##
==========================================
- Coverage 100% 99.93% -0.07%
==========================================
Files 160 160
Lines 7181 7189 +8
==========================================
+ Hits 7181 7184 +3
- Misses 0 4 +4
- Partials 0 1 +1
Continue to review full report at Codecov.
|
f67231c
to
e73dcd4
Compare
PR updated. Need to figure out a way parameterise |
@yurishkuro could you please re-run Travis on this? |
There is a git conflict.It needs to be rebased. |
Signed-off-by: Annanay <[email protected]>
Signed-off-by: Annanay <[email protected]>
Signed-off-by: Annanay <[email protected]>
e73dcd4
to
a509e7d
Compare
@pavolloffay I've rebased and removed conflicts. Also, if we want to allow users to force retrieval of more than 10k spans for a trace, we'd have to make changes to the |
@@ -30,12 +30,14 @@ import ( | |||
|
|||
const ( | |||
defaultQueryLimit = 100 | |||
defaultSpansLimit = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there was no default previously, we should not introduce one. I.e. default should be 0, which is == unlimited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the very last comment
|
||
operationParam = "operation" | ||
tagParam = "tag" | ||
tagsParam = "tags" | ||
startTimeParam = "start" | ||
limitParam = "limit" | ||
limitSpansParam = "limitSpans" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/limitSpans/maxSpansPerTrace/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the very last comment
@@ -105,6 +107,16 @@ func (p *queryParser) parse(r *http.Request) (*traceQueryParameters, error) { | |||
limit = int(limitParsed) | |||
} | |||
|
|||
limitSpansParam := r.FormValue(limitSpansParam) | |||
limitSpans := defaultSpansLimit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scope the string var to the if statement:
var maxSpansPerTrace int
if maxSpansParam := r.FormValue(maxSpansPerTraceParam); maxSpansParam != "" {
}
@@ -269,7 +269,7 @@ func (s *SpanReader) multiRead(ctx context.Context, traceIDs []string, startTime | |||
if val, ok := searchAfterTime[traceID]; ok { | |||
nextTime = val | |||
} | |||
searchRequests[i] = elastic.NewSearchRequest().IgnoreUnavailable(true).Type(spanType).Source(elastic.NewSearchSource().Query(query).Size(defaultDocCount).Sort("startTime", true).SearchAfter(nextTime)) | |||
searchRequests[i] = elastic.NewSearchRequest().IgnoreUnavailable(true).Type(spanType).Source(elastic.NewSearchSource().Query(query).Size(defaultDocCount).TerminateAfter(numSpans).Sort("startTime", true).SearchAfter(nextTime)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the difference between Size() and TerminateAfter()?
@@ -54,6 +54,7 @@ const ( | |||
|
|||
defaultDocCount = 10000 // the default elasticsearch allowed limit | |||
defaultNumTraces = 100 | |||
defaultNumSpans = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has no effect currently since multi-read is already bound by defaultDocCount
@@ -51,4 +51,5 @@ type TraceQueryParameters struct { | |||
DurationMin time.Duration | |||
DurationMax time.Duration | |||
NumTraces int | |||
NumSpans int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
however, why does this need to be passed from the interface? I think the overall feature is intended to protect storage from OOM, so the parameterization should be done via a parameter when instantiating the storage, not at query time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @yurishkuro, I think this particular issue was filed for OOM on the jaeger-query nodes, and we have a separate issue for handling overloading of the underlying storage - #960
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing it in the storage will address query-service OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will look into. Closing this PR.
So do we want to restrict number of spans (per trace) at ingestion? @yurishkuro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at ingestion (which isn't possible anyway since spans arrive async). We want the limit to be applied at query time, similar to the changes you made to plugin/storage/es/spanstore/reader.go
, but the value of the limit to be configurable as a parameter of the query-service, passed at Reader construction from CLI, not as a parameter of the HTTP request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry its taking so many iterations to get this one right. Understood now, sending in a new, clean PR.
Resolves #1051.
Changes:
Question: Do we need a flag to mark the result as truncated on the UI?
Signed-off-by: Annanay [email protected]